Christian Duffau-Rasmussen
03-10-2021
A procedure to combine many weak learners to produce a powerful committee.
(J. Friedman et al. 2009, sec. 10.1)
(Valiant 1984)
(Kearns and Valiant 1989)
strongly learnable == PAC learnable
A concept is weakly Learnable
(R. E. Schapire 1990)
(Freund 1990)
(R. Schapire and Freund 1995)
(J. H. Friedman 2001) (Mason et al. 1999)
(Chen and He 2015)
\[\text{Boosting} \succ \text{Random forest} \succ \text{Bagging} \succ \text{Tree}\]
\[Y = \begin{cases} 1 & \text{if}\quad X_1^2 + \ldots + X_{10}^2 > 9.34 \\ -1 & \text{else} \end{cases}\quad X_i\sim N(0,1)\]
\[\text{MSE} = \text{Bias}(\hat{f})^2 + \text{Var}(\hat{f}) + \text{Irreducible noise}\]
\[\text{Var}\left(\frac{1}{n}\sum_i^n X_i\right) = \frac{1}{n}\sum_i^n \text{Var}\left(X_i\right) + \frac{1}{n}\sum_{i\neq j} \text{Cov}(X_i, X_j)\]
\[\text{Variance of ensemble} = \frac{\text{Var(Trees)}}{n} + \frac{\text{Cov( Trees)}}{n}\]
🤷♂️
Scikit-learn documentation:
GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions.
Sound promising…
loss: {‘deviance’, ‘exponential’}For loss ‘exponential’ gradient boosting recovers the AdaBoost algorithm.
🤦♂️